Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
1.
Nat Commun ; 15(1): 132, 2024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38167256

RESUMO

Copy number variants (CNV) are shown to contribute to the etiology of several genetic disorders. Accurate detection of CNVs on whole exome sequencing (WES) data has been a long sought-after goal for use in clinics. This was not possible despite recent improvements in performance because algorithms mostly suffer from low precision and even lower recall on expert-curated gold standard call sets. Here, we present a deep learning-based somatic and germline CNV caller for WES data, named ECOLE. Based on a variant of the transformer architecture, the model learns to call CNVs per exon, using high-confidence calls made on matched WGS samples. We further train and fine-tune the model with a small set of expert calls via transfer learning. We show that ECOLE achieves high performance on human expert labelled data for the first time with 68.7% precision and 49.6% recall. This corresponds to precision and recall improvements of 18.7% and 30.8% over the next best-performing methods, respectively. We also show that the same fine-tuning strategy using tumor samples enables ECOLE to detect RT-qPCR-validated variations in bladder cancer samples without the need for a control sample. ECOLE is available at https://github.com/ciceklab/ECOLE .


Assuntos
Variações do Número de Cópias de DNA , Exoma , Humanos , Sequenciamento do Exoma , Exoma/genética , Algoritmos , Éxons , Sequenciamento de Nucleotídeos em Larga Escala/métodos
2.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37952175

RESUMO

MOTIVATION: Online assessment of tumor characteristics during surgery is important and has the potential to establish an intra-operative surgeon feedback mechanism. With the availability of such feedback, surgeons could decide to be more liberal or conservative regarding the resection of the tumor. While there are methods to perform metabolomics-based tumor pathology prediction, their model complexity predictive performance is limited by the small dataset sizes. Furthermore, the information conveyed by the feedback provided on the tumor tissue could be improved both in terms of content and accuracy. RESULTS: In this study, we propose a metabolic pathway-informed deep learning model (PiDeeL) to perform survival analysis and pathology assessment based on metabolite concentrations. We show that incorporating pathway information into the model architecture substantially reduces parameter complexity and achieves better survival analysis and pathological classification performance. With these design decisions, we show that PiDeeL improves tumor pathology prediction performance of the state-of-the-art in terms of the Area Under the ROC Curve by 3.38% and the Area Under the Precision-Recall Curve by 4.06%. Similarly, with respect to the time-dependent concordance index (c-index), PiDeeL achieves better survival analysis performance (improvement of 4.3%) when compared to the state-of-the-art. Moreover, we show that importance analyses performed on input metabolite features as well as pathway-specific neurons of PiDeeL provide insights into tumor metabolism. We foresee that the use of this model in the surgery room will help surgeons adjust the surgery plan on the fly and will result in better prognosis estimates tailored to surgical procedures. AVAILABILITY AND IMPLEMENTATION: The code is released at https://github.com/ciceklab/PiDeeL. The data used in this study are released at https://zenodo.org/record/7228791.


Assuntos
Aprendizado Profundo , Glioma , Humanos , Redes e Vias Metabólicas , Análise de Sobrevida , Área Sob a Curva
3.
Artigo em Inglês | MEDLINE | ID: mdl-34995191

RESUMO

Drug failures due to unforeseen adverse effects at clinical trials pose health risks for the participants and lead to substantial financial losses. Side effect prediction algorithms have the potential to guide the drug design process. LINCS L1000 dataset provides a vast resource of cell line gene expression data perturbed by different drugs and creates a knowledge base for context specific features. The state-of-the-art approach that aims at using context specific information relies on only the high-quality experiments in LINCS L1000 and discards a large portion of the experiments. In this study, our goal is to boost the prediction performance by utilizing this data to its full extent. We experiment with 5 deep learning architectures. We find that a multi-modal architecture produces the best predictive performance among multi-layer perceptron-based architectures when drug chemical structure (CS), and the full set of drug perturbed gene expression profiles (GEX) are used as modalities. Overall, we observe that the CS is more informative than the GEX. A convolutional neural network-based model that uses only SMILES string representation of the drugs achieves the best results and provides 13.0% macro-AUC and 3.1% micro-AUC improvements over the state-of-the-art. We also show that the model is able to predict side effect-drug pairs that are reported in the literature but was missing in the ground truth side effect dataset. DeepSide is available at http://github.com/OnurUner/DeepSide.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Redes Neurais de Computação , Algoritmos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/genética , Linhagem Celular
4.
Patterns (N Y) ; 3(7): 100524, 2022 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-35845835

RESUMO

Autism spectrum disorder and intellectual disability are comorbid neurodevelopmental disorders with complex genetic architectures. Despite large-scale sequencing studies, only a fraction of the risk genes was identified for both. We present a network-based gene risk prioritization algorithm, DeepND, that performs cross-disorder analysis to improve prediction by exploiting the comorbidity of autism spectrum disorder (ASD) and intellectual disability (ID) via multitask learning. Our model leverages information from human brain gene co-expression networks using graph convolutional networks, learning which spatiotemporal neurodevelopmental windows are important for disorder etiologies and improving the state-of-the-art prediction in single- and cross-disorder settings. DeepND identifies the prefrontal and motor-somatosensory cortex (PFC-MFC) brain region and periods from early- to mid-fetal and from early childhood to young adulthood as the highest neurodevelopmental risk windows for ASD and ID. We investigate ASD- and ID-associated copy-number variation (CNV) regions and report our findings for several susceptibility gene candidates. DeepND can be generalized to analyze any combinations of comorbid disorders.

5.
Genome Res ; 32(6): 1170-1182, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35697522

RESUMO

Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.


Assuntos
Aprendizado Profundo , Exoma , Algoritmos , Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reprodutibilidade dos Testes , Sequenciamento do Exoma
6.
Bioinformatics ; 38(16): 3935-3941, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35762943

RESUMO

MOTIVATION: Synthesizing genes to be expressed in other organisms is an essential tool in biotechnology. While the many-to-one mapping from codons to amino acids makes the genetic code degenerate, codon usage in a particular organism is not random either. This bias in codon use may have a remarkable effect on the level of gene expression. A number of measures have been developed to quantify a given codon sequence's strength to express a gene in a host organism. Codon optimization aims to find a codon sequence that will optimize one or more of these measures. Efficient computational approaches are needed since the possible number of codon sequences grows exponentially as the number of amino acids increases. RESULTS: We develop a unifying modeling approach for codon optimization. With our mathematical formulations based on graph/network representations of amino acid sequences, any combination of measures can be optimized in the same framework by finding a path satisfying additional limitations in an acyclic layered network. We tested our approach on bi-objectives commonly used in the literature, namely, Codon Pair Bias versus Codon Adaptation Index and Relative Codon Pair Bias versus Relative Codon Bias. However, our framework is general enough to handle any number of objectives concurrently with certain restrictions or preferences on the use of specific nucleotide sequences. We implemented our models using Python's Gurobi interface and showed the efficacy of our approach even for the largest proteins available. We also provided experimentation showing that highly expressed genes have objective values close to the optimized values in the bi-objective codon design problem. AVAILABILITY AND IMPLEMENTATION: http://alpersen.bilkent.edu.tr/NetworkCodon.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aminoácidos , Código Genético , Códon , Sequência de Aminoácidos
7.
Bioinformatics ; 38(12): 3238-3244, 2022 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-35512389

RESUMO

MOTIVATION: Identification and removal of micro-scale residual tumor tissue during brain tumor surgery are key for survival in glioma patients. For this goal, High-Resolution Magic Angle Spinning Nuclear Magnetic Resonance (HRMAS NMR) spectroscopy-based assessment of tumor margins during surgery has been an effective method. However, the time required for metabolite quantification and the need for human experts such as a pathologist to be present during surgery are major bottlenecks of this technique. While machine learning techniques that analyze the NMR spectrum in an untargeted manner (i.e. using the full raw signal) have been shown to effectively automate this feedback mechanism, high dimensional and noisy structure of the NMR signal limits the attained performance. RESULTS: In this study, we show that identifying informative regions in the HRMAS NMR spectrum and using them for tumor margin assessment improves the prediction power. We use the spectra normalized with the ERETIC (electronic reference to access in vivo concentrations) method which uses an external reference signal to calibrate the HRMAS NMR spectrum. We train models to predict quantities of metabolites from annotated regions of this spectrum. Using these predictions for tumor margin assessment provides performance improvements up to 4.6% the Area Under the ROC Curve (AUC-ROC) and 2.8% the Area Under the Precision-Recall Curve (AUC-PR). We validate the importance of various tumor biomarkers and identify a novel region between 7.97 ppm and 8.09 ppm as a new candidate for a glioma biomarker. AVAILABILITY AND IMPLEMENTATION: The code is released at https://github.com/ciceklab/targeted_brain_tumor_margin_assessment. The data underlying this article are available in Zenodo, at https://doi.org/10.5281/zenodo.5781769. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias Encefálicas , Glioma , Humanos , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/cirurgia , Metabolômica/métodos , Espectroscopia de Ressonância Magnética/métodos , Glioma/diagnóstico por imagem , Glioma/cirurgia , Imageamento por Ressonância Magnética
8.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2334-2344, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34086576

RESUMO

Drug combination therapies have been a viable strategy for the treatment of complex diseases such as cancer due to increased efficacy and reduced side effects. However, experimentally validating all possible combinations for synergistic interaction even with high-throughout screens is intractable due to vast combinatorial search space. Computational techniques can reduce the number of combinations to be evaluated experimentally by prioritizing promising candidates. We present MatchMaker that predicts drug synergy scores using drug chemical structure information and gene expression profiles of cell lines in a deep learning framework. For the first time, our model utilizes the largest known drug combination dataset to date, DrugComb. We compare the performance of MatchMaker with the state-of-the-art models and observe up to  âˆ¼ 15% correlation and  âˆ¼ 33% mean squared error (MSE) improvements over the next best method. We investigate the cell types and drug pairs that are relatively harder to predict and present novel candidate pairs. MatchMaker is built and available at https://github.com/tastanlab/matchmaker.


Assuntos
Aprendizado Profundo , Neoplasias , Biologia Computacional/métodos , Combinação de Medicamentos , Sinergismo Farmacológico , Humanos , Neoplasias/genética
9.
Bioinformatics ; 38(4): 908-917, 2022 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-34864867

RESUMO

MOTIVATION: Genome-wide association studies show that variants in individual genomic loci alone are not sufficient to explain the heritability of complex, quantitative phenotypes. Many computational methods have been developed to address this issue by considering subsets of loci that can collectively predict the phenotype. This problem can be considered a challenging instance of feature selection in which the number of dimensions (loci that are screened) is much larger than the number of samples. While currently available methods can achieve decent phenotype prediction performance, they either do not scale to large datasets or have parameters that require extensive tuning. RESULTS: We propose a fast and simple algorithm, Macarons, to select a small, complementary subset of variants by avoiding redundant pairs that are likely to be in linkage disequilibrium. Our method features two interpretable parameters that control the time/performance trade-off without requiring parameter tuning. In our computational experiments, we show that Macarons consistently achieves similar or better prediction performance than state-of-the-art selection methods while having a simpler premise and being at least two orders of magnitude faster. Overall, Macarons can seamlessly scale to the human genome with ∼107 variants in a matter of minutes while taking the dependencies between the variants into account. AVAILABILITYAND IMPLEMENTATION: Macarons is available in Matlab and Python at https://github.com/serhan-yilmaz/macarons. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Desequilíbrio de Ligação , Genoma Humano , Polimorfismo de Nucleotídeo Único
10.
Bioinformatics ; 40(5)2022 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38718189

RESUMO

MOTIVATION: Combination drug therapies are effective treatments for cancer. However, the genetic heterogeneity of the patients and exponentially large space of drug pairings pose significant challenges for finding the right combination for a specific patient. Current in silico prediction methods can be instrumental in reducing the vast number of candidate drug combinations. However, existing powerful methods are trained with cancer cell line gene expression data, which limits their applicability in clinical settings. While synergy measurements on cell line models are available at large scale, patient-derived samples are too few to train a complex model. On the other hand, patient-specific single-drug response data are relatively more available. RESULTS: In this work, we propose a deep learning framework, Personalized Deep Synergy Predictor (PDSP), that enables us to use the patient-specific single drug response data for customizing patient drug synergy predictions. PDSP is first trained to learn synergy scores of drug pairs and their single drug responses for a given cell line using drug structures and large scale cell line gene expression data. Then, the model is fine-tuned for patients with their patient gene expression data and associated single drug response measured on the patient ex vivo samples. In this study, we evaluate PDSP on data from three leukemia patients and observe that it improves the prediction accuracy by 27% compared to models trained on cancer cell line data. AVAILABILITY AND IMPLEMENTATION: PDSP is available at https://github.com/hikuru/PDSP.

11.
Proc Priv Enhanc Technol ; 2021(3): 28-48, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34746296

RESUMO

Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scientific progress promised by the big data era in genomics. A community-driven protocol named genomic data-sharing beacon protocol has been widely adopted for sharing genomic data. The system aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. However, beacon protocol was recently shown to be vulnerable against membership inference attacks. In this paper, we show that privacy threats against genomic data sharing beacons are not limited to membership inference. We identify and analyze a novel vulnerability of genomic data-sharing beacons: genome reconstruction. We show that it is possible to successfully reconstruct a substantial part of the genome of a victim when the attacker knows the victim has been added to the beacon in a recent update. In particular, we show how an attacker can use the inherent correlations in the genome and clustering techniques to run such an attack in an efficient and accurate way. We also show that even if multiple individuals are added to the beacon during the same update, it is possible to identify the victim's genome with high confidence using traits that are easily accessible by the attacker (e.g., eye color or hair type). Moreover, we show how a reconstructed genome using a beacon that is not associated with a sensitive phenotype can be used for membership inference attacks to beacons with sensitive phenotypes (e.g., HIV+). The outcome of this work will guide beacon operators on when and how to update the content of the beacon and help them (along with the beacon participants) make informed decisions.

12.
Genome Biol ; 22(1): 252, 2021 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-34465366

RESUMO

Detecting multiplets in single nucleus (sn)ATAC-seq data is challenging due to data sparsity and limited dynamic range. AMULET (ATAC-seq MULtiplet Estimation Tool) enumerates regions with greater than two uniquely aligned reads across the genome to effectively detect multiplets. We evaluate the method by generating snATAC-seq data in the human blood and pancreatic islet samples. AMULET has high precision, estimated via donor-based multiplexing, and high recall, estimated via simulated multiplets, compared to alternatives and identifies multiplets most effectively when a certain read depth of 25K median valid reads per nucleus is achieved.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Software , Idoso , DNA/genética , Humanos , Leucócitos Mononucleares/metabolismo , Funções Verossimilhança , Transposases/metabolismo
13.
Nat Commun ; 12(1): 1177, 2021 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-33608514

RESUMO

Mass spectrometry enables high-throughput screening of phosphoproteins across a broad range of biological contexts. When complemented by computational algorithms, phospho-proteomic data allows the inference of kinase activity, facilitating the identification of dysregulated kinases in various diseases including cancer, Alzheimer's disease and Parkinson's disease. To enhance the reliability of kinase activity inference, we present a network-based framework, RoKAI, that integrates various sources of functional information to capture coordinated changes in signaling. Through computational experiments, we show that phosphorylation of sites in the functional neighborhood of a kinase are significantly predictive of its activity. The incorporation of this knowledge in RoKAI consistently enhances the accuracy of kinase activity inference methods while making them more robust to missing annotations and quantifications. This enables the identification of understudied kinases and will likely lead to the development of novel kinase inhibitors for targeted therapy of many diseases. RoKAI is available as web-based tool at http://rokai.io .


Assuntos
Biologia Computacional/métodos , Redes e Vias Metabólicas , Fosfotransferases/metabolismo , Transdução de Sinais/fisiologia , Algoritmos , Doença de Alzheimer/metabolismo , Redes Reguladoras de Genes/fisiologia , Humanos , Espectrometria de Massas , Neoplasias/metabolismo , Doença de Parkinson/metabolismo , Fosfoproteínas , Fosforilação , Fosfotransferases/genética , Proteômica/métodos , Reprodutibilidade dos Testes , Software , Biologia de Sistemas/métodos
14.
Biosens Bioelectron ; 178: 113028, 2021 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-33508538

RESUMO

Whole cell biosensors (WCBs) have become prominent in many fields from environmental analysis to biomedical diagnostics thanks to advanced genetic circuit design principles. Despite increasing demand on cost effective and easy-to-use assessment methods, a considerable amount of WCBs retains certain drawbacks such as long response time, low precision and accuracy. Here, we utilized a neural network-based architecture to improve the features of WCBs and engineered a gold sensing WCB which has a long response time (18 h). Two Long-Short Term-Memory (LSTM)-based networks were integrated to assess both ON/OFF and concentration dependent states of the sensor output, respectively. We demonstrated that binary (ON/OFF) network was able to distinguish between ON/OFF states as early as 30 min with 78% accuracy and over 98% in 3 h. Furthermore, when analyzed in analog manner, we demonstrated that network can classify the raw fluorescence data into pre-defined analyte concentration groups with high precision (82%) in 3 h. This approach can be applied to a wide range of WCBs and improve rapidness, simplicity and accuracy which are the main challenges in synthetic biology enabled biosensing.


Assuntos
Técnicas Biossensoriais , Redes Reguladoras de Genes , Aprendizado de Máquina , Redes Neurais de Computação , Biologia Sintética
15.
IEEE/ACM Trans Comput Biol Bioinform ; 18(3): 1208-1216, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-31443041

RESUMO

Phenotypic heritability of complex traits and diseases is seldom explained by individual genetic variants identified in genome-wide association studies (GWAS). Many methods have been developed to select a subset of variant loci, which are associated with or predictive of the phenotype. Selecting connected SNPs on SNP-SNP networks have been proven successful in finding biologically interpretable and predictive SNPs. However, we argue that the connectedness constraint favors selecting redundant features that affect similar biological processes and therefore does not necessarily yield better predictive performance. In this paper, we propose a novel method called SPADIS that favors the selection of remotely located SNPs in order to account for their complementary effects in explaining a phenotype. SPADIS selects a diverse set of loci on a SNP-SNP network. This is achieved by maximizing a submodular set function with a greedy algorithm that ensures a constant factor approximation to the optimal solution. We compare SPADIS to the state-of-the-art method SConES, on a dataset of Arabidopsis Thaliana with continuous flowering time phenotypes. SPADIS has better average phenotype prediction performance in 15 out of 17 phenotypes when the same number of SNPs are selected and provides consistent improvements across multiple networks and settings on average. Moreover, it identifies more candidate genes and runs faster.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Arabidopsis/genética , Genes de Plantas/genética , Genômica/métodos , Análise de Sequência de DNA/métodos
16.
IEEE/ACM Trans Comput Biol Bioinform ; 18(4): 1474-1480, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-31581093

RESUMO

Genome-scale reconstructed metabolic networks have provided an organism specific understanding of cellular processes and their relations to phenotype. As they are deemed essential to study metabolism, the number of organisms with reconstructed metabolic networks continues to increase. This everlasting research interest lead to the development of online systems/repositories that store existing reconstructions and enable new model generation, integration, and constraint-based analyses. While features that support model reconstruction are widely available, current systems lack the means to help users who are interested in analyzing the topology of the reconstructed networks. Here, we present the Database of Reconstructed Metabolic Networks - DORMAN. DORMAN is a centralized online database that stores SBML-based reconstructed metabolic networks published in the literature, and provides web-based computational tools for visualizing and analyzing the model topology. Novel features of DORMAN are (i) interactive visualization interface that allows rendering of the complete network as well as editing and exporting the model, (ii) hierarchical navigation that provides efficient access to connected entities in the model, (iii) built-in query interface that allow posing topological queries, and finally, and (iv) model comparison tool that enables comparing models with different nomenclatures, using approximate string matching. DORMAN is online and freely accessible at http://ciceklab.cs.bilkent.edu.tr/dorman.


Assuntos
Bases de Dados Genéticas , Redes e Vias Metabólicas , Metabolômica/métodos , Algoritmos , Internet , Software
17.
J Comput Biol ; 28(4): 378-380, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33325775

RESUMO

Detecting interacting loci pairs has been instrumental to understand disease etiology when single locus associations do not fully account for the underlying heritability. However, the number of loci to test is prohibitively large. Epistasis test prioritization algorithms rank likely epistatic single nucleotide polymorphism (SNP) pairs to limit the number of statistical tests. Potpourri detects epistatic SNP pairs by diversifying the selected SNPs' genomic regions and investigating their co-occurrence patterns over the case cohort. It can also input and further prioritize SNPs in regulatory or coding regions. The program identifies and returns a list of prioritized SNP pairs for epistasis testing. This article describes how to use the program and the details of the input and output data.


Assuntos
Epistasia Genética/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único/genética , Software , Algoritmos , Genoma/genética , Humanos
18.
J Comput Biol ; 28(4): 365-377, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33275856

RESUMO

Genome-wide association studies (GWAS) explain a fraction of the underlying heritability of genetic diseases. Investigating epistatic interactions between two or more loci help to close this gap. Unfortunately, the sheer number of loci combinations to process and hypotheses prohibit the process both computationally and statistically. Epistasis test prioritization algorithms rank likely epistatic single nucleotide polymorphism (SNP) pairs to limit the number of tests. However, they still suffer from very low precision. It was shown in the literature that selecting SNPs that are individually correlated with the phenotype and also diverse with respect to genomic location leads to better phenotype prediction due to genetic complementation. Here, we propose that an algorithm that pairs SNPs from such diverse regions and ranks them can improve prediction power. We propose an epistasis test prioritization algorithm that optimizes a submodular set function to select a diverse and complementary set of genomic regions that span the underlying genome. The SNP pairs from these regions are then further ranked w.r.t. their co-coverage of the case cohort. We compare our algorithm with the state of the art on three GWAS and show that (1) we substantially improve precision (from 0.003 to 0.652) while maintaining the significance of selected pairs, (2) decrease the number of tests by 25-fold, and (3) decrease the runtime by 4-fold. We also show that promoting SNPs from regulatory/coding regions improves the performance (up to 0.8). Potpourri is available at http:/ciceklab.cs.bilkent.edu.tr/potpourri.


Assuntos
Epistasia Genética/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único/genética , Software , Algoritmos , Genômica/estatística & dados numéricos , Humanos , Locos de Características Quantitativas/genética
19.
Bioinformatics ; 36(Suppl_2): i903-i910, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381836

RESUMO

MOTIVATION: Big data era in genomics promises a breakthrough in medicine, but sharing data in a private manner limit the pace of field. Widely accepted 'genomic data sharing beacon' protocol provides a standardized and secure interface for querying the genomic datasets. The data are only shared if the desired information (e.g. a certain variant) exists in the dataset. Various studies showed that beacons are vulnerable to re-identification (or membership inference) attacks. As beacons are generally associated with sensitive phenotype information, re-identification creates a significant risk for the participants. Unfortunately, proposed countermeasures against such attacks have failed to be effective, as they do not consider the utility of beacon protocol. RESULTS: In this study, for the first time, we analyze the mitigation effect of the kinship relationships among beacon participants against re-identification attacks. We argue that having multiple family members in a beacon can garble the information for attacks since a substantial number of variants are shared among kin-related people. Using family genomes from HapMap and synthetically generated datasets, we show that having one of the parents of a victim in the beacon causes (i) significant decrease in the power of attacks and (ii) substantial increase in the number of queries needed to confirm an individual's beacon membership. We also show how the protection effect attenuates when more distant relatives, such as grandparents are included alongside the victim. Furthermore, we quantify the utility loss due adding relatives and show that it is smaller compared with flipping based techniques.


Assuntos
Genômica , Disseminação de Informação , Família , Fenótipo , Humanos
20.
Acta Gastroenterol Belg ; 83(4): 565-570, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33321012

RESUMO

BACKGROUND: Nonalcoholic fatty liver disease (NAFLD) is among the most common causes of chronic liver disease and cirrhosis. In NAFLD, histological course of steatosis is usually macrovesicular (MacroS), but it may be accompanied by varying degrees of microvesicular steatosis (MicroS). Thus, in this study, we aimed to evaluate the prevalence and significance of MicroS in subjects with NAFLD. METHODS: A retrospective analysis of clinical and laboratory data of patients with histologically proven NAFLD was performed. The liver biopsy specimens which stained with hematoxylin eosin, reticulin, and Masson's Trichrome stains were evaluated by single expert liver pathologist. Scoring and semiquantitative assessment of steatosis and NAFLD severity was done according to Kleiner scale known as NAFLD activity score (NAS). Grading for steatosis, steatosis type, zonal distribution of steatosis and other histological findings were also determined. RESULTS: The prevalence of MicroS among the study population (n= 191) was 30.4%. There was no difference regarding the demographic and biochemical parameters between patients with or without MicroS. On the other hand, the prevalence of ballooning injury and megamitochondria were higher in patients with MicroS (p= 0.019 and p= 0.036, respectively). There was a significant association of MicroS with ballooning injury (OR 2.65, 95% CI= 1.26-5.55 ; p= 0.005) and the presence of megamitochondria (OR 3.72, 95% CI= 1.00-13.72 ; p= 0.037). CONCLUSION: MicroS is common in patients with NAFLD and is associated with early histological findings in this clinically relevant condition. Further longitudinal studies are needed to characterize the role of MicroS in the natural history of NAFLD.


Assuntos
Hepatopatia Gordurosa não Alcoólica , Biópsia , Humanos , Fígado/patologia , Cirrose Hepática/patologia , Hepatopatia Gordurosa não Alcoólica/diagnóstico , Hepatopatia Gordurosa não Alcoólica/epidemiologia , Hepatopatia Gordurosa não Alcoólica/patologia , Estudos Retrospectivos , Índice de Gravidade de Doença
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...